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A  New  Statistical  Model  for  the  Calibration  of 

Force  Sensors 

Charles  P.  Reeve 

Abstract 

The  National  Bureau  of  Standards  has  been  calibrating  force  sensors  for  many 
years.  The  objective  in  these  calibrations  is  to  determine  the  functional  relationship 
between  the  applied  load  and  the  sensor  response.  In  a  typical  calibration  several  runs 
are  made  in  which  identical  sequences  of  known  loads  are  applied  to  the  sensor.  The 
sensor  is  rotated  in  the  loading  machine  between  runs.  The  previous  method  of  analysis 
incorporated  a  quadratic  polynomial  model  which  was  fit  to  the  pooled  data.  The  new 
method  presented  here  fits  separate  polynomials  to  the  mean  data  and  between-run 
differences.  The  'best'  degrees  of  these  polynomials  are  automatically  determined  by 
algorithms  which  incorporate  statistical  tests.  As  a  result,  error  contributions  from 
several  sources  are  quantized.  Methods  for  computing  confidence  intervals  for  the 
'true'  sensor  response  and  for  a  new  observation  are  given,  and  methods  of  inverse 
prediction  (measurement  of  an  unknown  load)  based  on  these  intervals  are  illustrated. 

Key  words:  calibration  curve;  confidence  interval;  force  sensor;  inverse  prediction;  least 
squares  polynomial;  load  cell;  proving  ring;  statistical  model 

1      Introduction 

The  National  Bureau  of  Standards  (NBS)  has  been  calibrating  force  measuring  devices  for 
many  years.  The  current  status  of  these  calibrations  has  been  described  by  Mitchell  [13].  In 
that  paper  the  generic  term  'force  sensor'  is  used  to  include  '...  instruments  and  systems  that 
are  variously  referred  to  by  names  such  as  load  cell,  proving  ring,  force  gage,  force  link,  force 
transducer,  load  ring,  ring  dynamometer,  compression  dynamometer,  tension  dynamometer, 
and  crane  scale.'  Hereafter  the  generic  term  will  be  used  except  in  reference  to  a  specific 
type  of  sensor.  These  sensors  are  normally  used  to  measure  forces  in  the  10  to  1,000,000 
pound-force  (Ibf)  range.  Their  response  curves  typically  look  linear  to  the  eye  as  shown  in 
figure  1. 

The  objective  in  calibrating  a  force  sensor  is  to  determine  the  functional  relationship 
between  the  applied  load  and  the  sensor  response.  This  is  accomplished  in  the  laboratory 
by  applying  a  series  of  known  loads  to  the  sensor  and  observing  its  response  on  a  readout 


instrument.  The  calibration  is  usually  complicated  by  the  interaction  of  the  sensor  with  the 
loading  machine  as  discussed  by  Mitchell  and  Pontius  [15].  Also,  the  sensor  may  creep  or  drift 
under  sustained  load  (see  Mitchell  and  Baker  [14])  and  it  may  be  sensitive  to  atmospheric 
conditions  such  as  temperature  and  pressure.  A  general  discussion  of  the  above  problems 
and  others  is  given  by  Pontius  and  Mitchell  [19]. 

A  series  of  interlaboratory  comparisons  of  several  force  sensors  over  a  seven  year  period 
has  been  reported  by  Peterson,  Jenkins,  and  Mitchell  [18]  (following  a  preliminary  report 
by  Peterson  and  Bloss  [17]).  In  general  the  force  sensors  exhibited  reasonable  short  term 
stability,  but  there  was  some  evidence  of  long  term  drift  which  indicates  the  need  for  their 
periodic  recalibration. 

The  purpose  of  this  paper  is  to  present  a  new  statistical  model  for  the  force  sensor 
calibration  process  which  takes  into  account  errors  from  several  known  sources.  The  model, 
based  on  variable-degree  polynomials,  is  designed  to  extract  as  much  information  as  possible 
from  data  taken  in  the  traditional  manner.  By  quantifying  the  various  kinds  of  errors,  the 
underlying  processes  may  be  better  understood  as  more  calibration  results  are  analyzed  by 
the  new  model. 

In  developing  the  new  model  many  sets  of  force  sensor  data  were  analyzed.  As  more  data 
are  collected  and  analyzed  the  model  may  need  to  be  modified.  For  this  reason  the  methods 
in  this  paper  should  not  be  construed  as  the  ultimate  answer  to  the  force  sensor  calibration 
problem. 

The  mathematical  and  statistical  aspects  of  the  model  are  described  in  considerable 
detail  in  section  4.  A  nontechnical  outhne  of  the  new  model  is  given  in  section  3.  Some 
prehminary  discussions  of  other  aspects  of  force  sensor  cahbration  are  condensed  from  the 
previously  referenced  papers  and  given  in  section  2.  The  reader  would  do  well  to  be  familiar 
with  those  references,  especially  [13],  beforehand. 

An  example  with  real  data  is  given  in  section  5,  and  the  results  of  a  simulation  study 
based  on  this  example  are  reported  in  section  6.  A  question  of  optimality  in  the  choice  of 
applied  loads  is  considered  in  section  7. 

The  FORTRAN  computer  program  FCAL88  has  been  written  to  perform  the  analyses 
described  in  this  paper.  The  program  is  self  contained  and  has  been  successfully  run  on  both 
mainframe  and  personal  computers.  A  copy  of  the  program  can  be  made  available  to  the 
user  upon  request. 

2     Preliminaries 
2.1      Previous  Methods 

In  1946  Wilson,  Tate,  and  Borkowski  [25]  published  a  paper  on  the  calibration  of  proving 
rings  in  which  'calibration  factors',  the  ratio  of  the  load  to  the  corresponding  deflection, 
were  computed.  Corresponding  to  these  factors  were  'specification  limits'  which  served  as 
uncertainty  bounds.  The  factors  were  not  quite  constant  with  load,  thus  indicating  a  slight 


nonlinearity  in  the  proving  ring  response.  Errors  from  various  sources  and  loading  sequences 
were  also  discussed. 

In  1964  Hockersmith  and  Ku  [11]  developed  a  statistical  model  for  proving  rings  in  which 
the  response  (the  pooled  runs)  was  taken  to  be  a  quadratic  function  of  the  applied  load. 
A  method  for  computing  a  confidence  band  for  the  'true'  underlying  response  function  was 
given.  This  model  proved  satisfactory  and  was  incorporated  into  the  calibration  process  for 
both  proving  rings  and  load  cells. 

2.2      Loading  Sequences 

Many  force  sensors  can  be  calibrated  in  both  tension  and  compression  modes.  The  response 
is  expected  to  be  somewhat  different  in  each  mode.  Due  to  hysteresis  effects  the  response 
may  also  depend  on  whether  the  loads  are  applied  in  ascending  or  descending  order.  The 
device  may  thus  have  several  distinct  cahbration  curves. 

The  ASTM  Standard  [1],  currently  most  frequently  used  at  NBS,  requires  at  least  thirty 
measurements  over  the  working  range  of  the  device  during  calibration.  At  NBS  the  calibra- 
tion often  consists  of  two  runs  at  fifteen  loads  or  three  runs  at  ten  loads.  In  the  latter  case  the 
applied  loads  are  typically  at  10%,  20%,  30%,  ...,  90%,  and  100%  of  capacity.  Between  runs 
the  sensor  is  rotated  in  the  loading  machine.  This  rotation  allows  errors  from  misalignment 
to  be  sampled. 

It  is  necessary  to  make  measurements  at  zero  load  in  order  to  define  the  basehne  for  the 
readout  of  the  sensor.  Since  conditions  under  zero  load  can  be  different  than  under  nonzero 
loads,  those  readings  have  to  be  treated  differently.  The  manner  in  which  they  are  treated 
depends  somewhat  on  the  characteristics  of  the  loading  machine.  Some  deadweight  machines 
require  weights  to  be  added  sequentially  while  others  require  a  return  to  zero  between  loads. 
In  general  a  run  will  consist  of  a  series  of  readings  at  zero  and  nonzero  loads.  From  these 
readings  a  set  of  deflections  can  be  computed  for  each  of  the  nonzero  loads  adjusted  for 
instrument  drift  to  a  zero  baseline.  The  manner  of  adjustment  depends  on  the  theorized 
drift  behavior  of  the  instrument  between  zero  loads.  If  the  instrument  is  assumed  not  to 
drift  then  an  initial  zero  load  reading  is  subtracted  from  the  following  nonzero  load  readings. 
If  a  linear  drift  is  assumed  then  nonzero  load  readings  are  adjusted  proportionally  according 
to  'before'  and  'after'  zero  load  readings.  It  will  be  assumed  that  all  loads  are  equally  spaced 
in  time  so  that  drift  corrections  can  be  easily  computed.  It  is  imperative  that  the  same 
loading  sequence  be  used  for  each  run. 

In  [19]  a  quadratic  curve  is  said  to  be  fit  to  the  y—yo  data.  This  means  that  readings  y  were 
taken  at  various  loads  and  the  reading  of  the  instrument  under  zero  load  yo  was  subtracted 
from  it.  In  cases  where  a  particular  yo  value  or  a  combination  of  yo  values  is  subtracted  from 
more  than  one  y  value  the  resulting  y  —  yo  values  are  correlated.  Previous  fitting  procedures 
have  ignored  this  correlation  by  assuming  that  all  the  y  —  yo  were  independent.  An  attempt 
to  incorporate  these  correlations  into  the  new  model  was  made.  However,  the  computational 
complexities  introduced  were  felt  to  outweigh  the  aesthetic  value  in  'correctly'  modeling  the 


observations.  As  a  result  the  y  —  yo  values  will  continue  to  be  assumed  independent  in  the 
new  method  of  analysis. 

2.3      Sources  of  Error 

The  method  of  analysis  in  [1,11,19],  discussed  earher,  is  to  pool  all  the  runs  and  fit  a  quadratic 
curve  to  the  combined  data.  Upon  examining  hundreds  of  these  fits  to  load  cell  data  the 
two  most  often  seen  phenomena  are  a  serpentine  (S'-shaped)  curve  in  the  residuals,  shown  in 
figure  3,  and  a  fanning  out  of  the  residuals  from  each  run  as  shown  to  some  extent  in  figure 
8.  The  serpentine  curve  is  thought  to  be  the  result  of  model  error,  that  is,  the  quadratic 
curve  insufficiently  modeling  the  actual  underlying  sensor  characteristic.  At  this  point  there 
is  no  solid  theoretical  basis  for  stating  what  the  response  should  be. 

In  this  study  many  mathematical  models  were  tried  in  hopes  of  finding  a  general  curve 
which  would  adequately  fit  the  vast  majority  of  calibration  data.  These  models  include 
quadratic  and  higher  polynomials,  half  power  polynomials,  negative  power  polynomials,  and 
exponential,  sinusoidal,  logarithmic,  and  Bessel  functions.  In  specific  cases  each  of  these 
has  provided  a  good  fit,  but  none  has  been  found  satisfactory  in  general.  The  most  success 
was  had  with  quadratic,  cubic,  and  quartic  polynomials.  The  cubic  term  was  found  to  be 
the  most  effective  way  of  removing  the  serpentine  curve  in  many  cases.  The  decision  was 
thus  made  to  base  the  new  method  on  seeking  higher  order  polynomial  terms  which  will 
significantly  decrease  the  residual  error. 

The  fanning  out  of  residuals  is  due  to  between-run  error,  referred  to  in  [15]  as  consisting 
of  'zero  shift'  and  'load  proportional'  components.  This  error  is  believed  to  be  due  to  changes 
in  mechanical  alignment  (or  misalignment)  when  the  sensor  is  rotated,  with  the  magnitude 
of  the  error  being  dependent  on  the  sensor/ machine  combination.  Ideally,  the  net  applied 
load  is  parallel  and  symmetric  with  respect  to  the  axis  of  the  sensor,  and  the  sensor  response 
function  is  constant  over  time.  In  reality,  however,  small  components  of  non-axisymmetric 
forces  exist  whenever  a  load  is  applied.  The  sensor  response  depends  on  the  magnitude  of 
those  forces,  the  sensitivity  of  the  sensor  to  those  forces,  and  the  orientation  of  the  sensor  in 
the  loading  machine.  Between-run  errors  appear  to  take  the  form  of  separate  polynomials  of 
the  same  degree  but  possibly  different  coefficients. 

A  study  by  Mitchell,  Seifarth,  and  Reeve  [16]  showed  that  eccentric  (off-center)  loading 
results  in  sinusoidal  differences  in  response  as  a  function  of  angle  of  orientation  of  the  load 
eccentricity.  By  making  runs  with  the  sensor  equiangularly  spaced  around  the  circle,  the 
sinusoidal  components  of  error  tend  to  cancel  out  when  the  runs  are  averaged. 

Other  sources  of  error  are  generally  of  smaller  magnitude  and  easier  to  model.  The  ran- 
dom fluctuations  in  the  output  under  repeated  loads  of  the  same  magnitude  are  another 
kind  of  error.  These  may  be  due  to  many  separate  physical  processes  which  are  ordinarily 
not  measured  during  a  calibration,  for  example,  fluctuations  in  atmospheric  conditions,  vi- 
brations, and  oscillations  of  the  dead  weights.  The  effect  of  all  these  short  term  variations 
combine  to  give  what  looks  like  random  'noise'  and  can  be  called  within-run  error. 
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When  the  instrument  readout  is  too  coarse  a  roundoff  error  results.  This  type  of  error 
can  easily  mask  the  effects  of  other  errors. 

Finally,  errors  in  the  accepted  dead  weight  values  whether  due  to  calibration  error  or  air 
buoyancy  effects  are  a  source  of  systematic  error.  A  term  to  account  for  this  may  be  added 
at  the  end  of  the  calibration.  The  value  for  this  error  at  NBS  is  estimated  not  to  exceed 
0.002%.  This  term  is  not  added  in  the  ASTM  method  of  analysis. 

2.4      Preparation  for  Measurement 

The  force  sensor  is  mounted  in  the  loading  machine  in  either  the  tension  or  compression  mode. 
Several  'warm-up'  or  'pre-loading'  runs  are  then  made  in  order  to  stabilize  the  response  of 
the  device  and  reduce  hysteresis  effects. 

3     Features  of  This  Method 

The  purpose  of  this  section  is  to  describe  in  nontechnical  terms  how  the  new  method  of 
analysis  works.  The  reader  who  is  interested  in  the  details  of  the  statistical  model  and 
analysis  will  find  those  in  section  4.  The  five  main  features  of  the  method  are: 

1.  A  polynomial  response  model  is  fit  to  the  mean  data  (the  average  of  all  the  runs). 
Subject  to  certain  limitations  the  degree  of  this  polynomial  is  determined  from  the 
data.  The  upper  limit  on  the  degree  of  the  polynomial  is  set  by  the  user.  Figures  2, 
3,  and  4  show  residuals  from  linear,  quadratic,  and  cubic  fits  respectively  to  the  mean 
data  from  the  example  in  appendix  B  (note  the  different  vertical  scales).  The  cubic  fit 
was  determined  to  be  'best'.  In  this  case  the  upper  limit  on  the  degree  was  five. 

2.  A  polynomial  response  model  is  fit  to  each  set  of  difference  data  (the  difference  between 
the  individual  runs  and  the  mean  data).  Subject  to  certain  limitations  the  degree 
of  these  polynomials  is  automatically  determined  from  the  data.  The  degree  of  each 
polynomial  is  constrained  to  be  the  same,  and  as  before  the  upper  limit  on  that  degree 
is  set  by  the  user.  For  the  example  the  difference  data  is  shown  in  figures  5,  6,  and 
7  with  a  straight  line  fit  to  each  run.  The  combined  data  are  shown  in  figure  8.  The 
linear  fit  was  determined  to  be  'best'.  The  upper  limit  on  the  degree  was  three. 

3.  A  rough  error  budget  is  computed  for  each  of  the  five  sources  of  error  discussed  in 
section  2.3.  This  summary  may  prove  to  be  useful  in  correlating  the  error  structure  of 
the  data  with  the  underlying  physical  processes.  The  error  budget  for  the  example  is 
shown  in  table  2  in  section  5  in  the  k*— 3  column. 

4.  Three  types  of  confidence  intervals  are  computed  and  their  proper  usage  is  discussed. 
These  intervals  provide  the  user  with  a  reahstic  estimate  of  the  uncertainty  associ- 
ated with  the  calibrated  force  sensor  response  function,  and  they  allow  the  reahstic 
uncertainty  of  further  use,  such  as  inverse  prediction,  to  be  computed. 
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5.  Three  types  of  inverse  prediction  are  discussed  in  which  unknown  loads  are  measured 
by  the  cahbrated  force  sensor.  The  uses  of  these  methods  are  graphically  illustrated  in 
figures  9,  10,  and  11. 

4      Statistical  Analysis 
4.1      Statistical  Model 

The  sensor  response  to  axial  forces  may  be  considered  fixed  throughout  the  calibration,  but 
the  response  to  non-axial  forces  must  be  considered  random  because  it  changes  unpredictably 
whenever  the  sensor  is  re-oriented  in  the  loading  machine.  The  fixed  response  to  axial  loads 
Li  is  modeled  by  the  A;-degree  polynomial 

vk{Ld  =  J:m  (1) 

where  k  is  usually  two  or  three.  During  the  j^^  orientation  of  the  sensor  [j^^  run),  the  random 
response  to  non-axial  forces  is  modeled  by  the  A-degree  polynomial 

h 
qhj{Li)  =  Yl  UtjL\ 
t=o 

where  h  is  usually  zero  or  one  and  j  indexes  the  runs.  Though  assumed  to  be  drawn  randomly 
from  a  population,  the  {wtj|i  =  0,1,..., /i}  are  treated  as  fixed  effects  for  the  duration  of 
the  j^^  run.  In  some  cases  the  random  effects  may  prove  to  be  not  significant,  in  which  case 
h  is  taken  to  be  —1  and  the  {ufj}  do  not  exist. 

The  following  notation  will  be  used  in  the  statistical  analysis  which  incorporates  these 
polynomial  models.  Let 

n     =     the  number  of  distinct  nonzero  loads  applied  per  run, 
r     =     the  number  of  runs  (sequences  of  n  loads), 
Li    =     the  i*''  nonzero  load  in  the  loading  sequence 

{i  =  l,2,...,n), 
yij    =    the  observed  (net)  value  of  the  instrument  output 

when  load  Li  is  applied  during  run  j  [j  =  1, 2, . . . ,  r), 
^t    —     the  coefficient  of  L]  in  the  fixed  response 

(^  =  0,1,...,/:), 
Utj     —     the  coefficient  of  Lj  in  the  random  response 
(^  =  0,l,...,A;^-  =  l,2,...,r), 
k    =     the  degree  of  the  fixed  response  polynomial, 


h  =  the  degree  of  the  random  response  polynomials, 

k-min  =  the  lower  bound  on  k  (usually  1), 

hmin  —  the  lower  bound  on  h  (usually  —1), 

kmax  —  the  upper  bound  on  fc, 

hmax  —  the  upper  bound  on  /i,   and 

tij  ■=  the  random  'noise'  in  the  instrument  output 

when  the  i^^  load  is  apphed  during  the  j^^  run. 

The  statistical  model  then  takes  the  form 

Vij    =    Pk{Li)  +  qhj{Li)  +  €ij 

=    f3o  +  i3iLi +  ...-{-  pkL\  +  uo,  +  wi,L,  +  . . .  +  UhjL^i  +  e.-j  (2) 

where  the  polynomial  degrees  k  and  h  [k  >  h)  are  to  be  determined  from  the  data  by 
statistical  tests.  The  {e,j}  are  assumed  to  be  independent  error  values  from  a  distribution 
with  mean  zero  and  variance  a^.  (Recall  from  section  2.2  that  the  y,j  are  assumed  to  be 
independent  even  though  correlations  may  have  been  introduced  through  corrections  for 
instrument  drift).  The  vectors  of  random  coefficients  (uqj  Uij  . . .  u^j)^  {j  =  1, 2, . .  . ,  r)  are 
assumed  to  be  independent  with  expectation  (0  0  ...  0)^  and  variance  S^,  an  {h-\-l)  x  (A  +  1) 
symmetric  positive  definite  matrix.  (The  't'  superscript  indicates  matrix  transposition.) 

The  'best'  degree  polynomial  fit  to  data  may  be  determined  either  by  starting  with  a  low 
degree  polynomial  and  adding  higher  order  terms  which  are  significant,  or  by  starting  with 
a  high  degree  polynomial  and  deleting  higher  order  terms  which  are  not  significant.  After 
due  consideration  the  method  of  'deleting  terms',  as  recommended  by  Hager  and  Antle  [9], 
was  determined  to  be  the  most  appropriate  for  computing  the  'best'  values  of  both  k  and  h 
(also  see  Hoel  [12]).  This  requires  the  user  to  establish  the  upper  bounds  kmax  and  hmax  (on 
k  and  h  respectively)  for  the  initial  fits. 

A  convenient  way  to  analyze  model  (2)  is  to  work  with  the  independent  sets  of  observa- 
tions {yi}  and  {zij}  derived  from  the  {yij}  by 

r 

Vi    =     ^Vijlr    (i  =  1,2,  ...,n)    and 
Zi]     =     Vi]  -  Vi    (i  =  1,2,  ...,r). 

These  sets  can  be  identified  as  the  mean  data  and  the  difference  data  respectively. 
In  matrix  notation  the  m,odel  for  the  mean  data  is 

y=:X;9+XiU  +  6  (3) 

where 

y  =   {h  y2  ■■■  Vn)^, 


p   = 


u 


(^0  Pi    . 
{uo  Ui    . 


-■  Y    with    ut  =  ^utj/r, 


=     (ei  €2 


e„)^    with    tt  =  ^  ctj/r, 


and 


X  =  (Xi  X2)  - 


/  1 
1 


L2 


h+l 


V    1       Ln 


Li    I    Li 
Lfi    I    L^+' 


Li    I    L^^ 


L', 


LU 


In  statistical  terminology  (3)  is  a  'mixed'  model  because  of  the  presence  of  fixed  (^)  and 
random  (u,  e)  effects.  It  follows  that  E(y)  =  X;3  and  Var(y)  =  (XiE^Xf +crel)/^  =  ^  where 
I  is  the  n  X  n  identity  matrix  and  V  is  an  n  x  n  covariance  matrix  which  is  positive  definite 
when  al  >  0  [E(-)  and  Var(-)  denote  the  statistical  expectation  and  variance].  Giesbrecht  and 
Burns  [8]  show  that,  when  estimates  of  S^  and  a^  are  available,  the  generalized  least  squares 
estimate  of  the  fixed  effects  is  ^  =  (X^V-iX)-iX^V-iy  with  Y^r{^)  =  ±^  =  (X^V-^X)"! 

where  V  =  (XiSyXf  +  alT)/r.  In  the  'large  sample'  case  the  estimate  /3  is  known  to  have  a 
smaller  variance  than  the  ordinary  least  squares  estimate.  (Methods  for  testing  hypotheses, 
computing  confidence  intervals,  and  approximating  degrees  of  freedom  are  also  given  in  [8]. 
For  more  discussion  of  the  mixed  model  see  Cunningham  and  Henderson  [5],  Henderson  [10], 
and  Searle  [21;22,  ch.  9-10].) 

In  section  4.2  it  will  be  shown  that  the  variance  estimate  S^^  with  r—  1  degrees  of  freedom 
is  obtainable  from  the  difference  data.  Current  practice  at  NBS  is  to  make  three  runs,  thus 
Su  is  based  on  only  two  degrees  of  freedom.  Although  the  number  of  runs  could  be  increased 
some,  it  is  not  feasible  to  make  r  —  1  'large'.  Since  bad  estimates  of  /3  could  result  from  using 
Su  based  on  very  few  degrees  of  freedom,  the  above  generalized  least  squares  approach  will 
be  set  aside  in  favor  of  ordinary  (unweighted)  least  squares. 

For  the  purpose  of  computation  the  {ufj}  are  treated  as  fixed  effects,  thus  model  (3)  for 
the  mean  data  can  be  re-expressed  as 


y  =  X«  +  6 


(4) 


where 


e     =    (^0  +  Uo   f3i  +  ui 
=    {9q  9i  .  ..  6k) 


Ph  +  Uh   /3h+i    (3h+2  •  ••  /^kY 


(5) 


and  €  is  as  before.    In  this  formulation  the  mean  effect  w^,  treated  as  a  fixed  effect,  gets 
'absorbed'  into  the  effect  /?<(<  =  1,2, ...,  /i)  thus  only  the  net  effect  6t  can  be  estimated.  It 
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follows  that 


E(y)    =    X^,    and 
Var(y)    =    lal/r. 


This  model  will  be  analyzed  in  section  4.3. 
The  model  for  the  difference  data  is 


where 


Zi  = 


Z2 

V  z,  / 


/  ^1.  \ 


/  Xi      0 
0     Xi 

V  0     0 


0 


Xi  J 


d2 
Vd,  J 


+ 


62 


,    d,  = 


V    Znj    J 


(    Uoj  —  Uo    \ 
Uij  —  Ui 

\   Uhj  -  Uh   I 


1     ^j  — 


e2j  —  £2 


(6) 


(7) 


and  the  zero  matrices  (0)  are  n  x  [h  -{-  1).    Treating  the  differences  dj  as  fixed  effects,  it 
follows  that 


E(zj)    =    Xidj    and 
Var(z,)    -    Ial{r-l)/r    (j  =  1,2, . . . ,  r). 


(8) 


This  model  is  analyzed  in  the  next  section. 


4.2      Analysis  of  Difference  Data 

Applying  the  usual  method  of  unweighted  least  squares  (see  Cameron  [2,  pp.  24-26])  to  (8), 
for  a  given  h  >  0,  yields  the  least  squares  estimates 


-Tv  \-ivT 


d,-(Xi^Xi)-^X,^z,    (i  =  l,2,...,r). 


(9) 


Recall  from  section  4.1  that  the  between-run  effects  modeled  by  {utj}  may  not  always  be 
significant.  When  that  is  the  case  there  are  no  {dj}  to  estimate,  since  model  (7)  is  degenerate, 
and  h  is  taken  to  be  —1. 

The  total  residual  sum  of  squares,  incorporating  the  r  systems  in  (7)  and  (8),  is 


CO  (h\  -  !  ^U(^i  -  Xid,)'"(Z;  -  X,d,)    (h  >  0) 


so  that 


E[SSM]  =  {n-h-l){r-l)a 


(10) 


(11) 
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Note  that  the  factor  r  —  1  appears  in  (11)  instead  of  r  since  the  means  {e,}  were  subtracted 
from  the  error  values  {cjj}  in  (7). 

The  following  algorithm  is  used  to  determine  from  the  diiference  data  the  'best'  degree, 
h*,  of  the  polynomials  qhj{Li)  modehng  the  {utj}  effects.  Statistical  hypothesis  testing  at 
a  level  of  significance  Qh  is  used  to  determine  whether,  for  a  given  h^  the  terms  {uhjL'l\j  = 
1,2, ...  ,r}  are  needed.  The  /i^ax-degree  model  is  fit  first,  then  insignificant  higher  degree 
terms  are  dropped  r  at  a  time.  The  constraint  hmin  ^  h*  <  hmax  is  applied  where  hmin  =  —  1- 

Algorithm  H. 

1.  oGZ  fl  ^  ri-max- 

2.  \i  h  =  hmin  then  step  5.  Else, 

3.  Set  Ve  =  {n  —  h  —  l)(r  —  1)  and  compute  the  statistic 

^   ^  [SSe(fe-l)-SSe(/^)]/(r-l) 

SSe{h)/l>e 

which  has  the  F(r  —  1,  i/g)  distribution  under  the  hypothesis  Uhi  —  Uh2  =  . . .  —  Uhr  =  0. 
Use  (9)  and  (10)  as  appropriate. 

4.  If  Gf (r_i,i/^)(iVi)  <  1  —  a/i  set  h  =  h—1  then  step  2  (the  left  hand  side  of  the  inequaUty 
is  the  cumulative  distribution  function  of  the  F  distribution  with  parameters  r  —  1  and 
i^e  as  defined  in  appendix  A).  Else, 

5.  Set  h*  =  h, 

u,    =    {n-h*-l){r-l)  (12) 

<7,2      =      SSe(/.*)/z/e,  (13) 

and,  if  h*  >  0, 

±u  =  td,dj/{r-l).  (14) 

i=i 

6.  Done. 

When  h*  >  0  the  estimated  mean  squared  error  for  between-run  effects  is  computed  from 
(10)  and  (11)  to  be 

.2       SSe(-l)-SSe(/^*) 

The  statistic  F^,  —  ^l/crl  then  has  the  F[{r—  l){h*  +  1),  Ve]  distribution  under  the  hypothesis 
{utj  =  0\t  =  0,1,...,  h*]j  =  1,2,  ...,r}.  It  may  seem  strange  to  calculate  this  statistic 
when  the  fact  that  h  >  0  already  implies  that  the  between-run  effects  are  significant  as 
determined  by  the  statistic  Fh  in  step  3  of  algorithm  H.  The  usefulness  of  Fb  is  that  it  tests 
the  significance  of  the  r{h*  -f  1)  nonzero  {wfj}  effects  as  a  whole  whereas  F^  tests  only  r  of 
the  {utj}  at  a  time.  The  level  of  significance  of  Fj,  may  then  be  displayed  as  the  significance 
of  the  between-run  effects. 
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4.3      Analysis  of  Mean  Data 

Applying  the  method  of  unweighted  least  squares  to  (4)  and  (6),  for  a  given  k  >  0,  yields 

9  =  (X^X)-iX^y  (16) 

where 

Var(^)  =  S^  =  (X^X)-V,Vr.  (17) 

After  fitting  the  0  effects  the  remaining  components  of  error  in  (4)  are  the  mean  noise  effects 
€  and  possible  'model  error'  due  to  the  exclusion  of  significant  higher  order  terms.  In  the 
manner  of  Draper  and  Smith  [4,  pp.  26-30]  the  model  error  /z,  at  load  Li  is  taken  to  be 

fii  =  Ti  -  E[pk{Li)]    (i  =  1, 2, . . . ,  n) 

where  Ti  is  the  'true'  device  response  to  the  axial  load  Li  and  pk{Li)  is  as  in  (1).  If  the  model 
is  correct  then  ^,  =  0  (i  =  1,2, . . .  ,n).  The  individual  model  error  values  {/ij}  cannot  be 
estimated  (since  the  true  values  {r,}  are  unknown),  but  their  variance  cr^  =  ^7=i  A'?/'^  ^^^ 
be  estimated  as  will  be  shown  later  (/i  =  Z^iLi  f^i/''^  is  assumed  to  be  zero). 

After  obtaining  least  squares  estimates  for  0  by  (16),  under  the  /^-degree  model,  the 
residual  sum  of  squares  in  (4)  is 

SSe^(A;)  =  (y-X0)^(y-X^)  (18) 

so  that 

The  following  algorithm  is  used  to  determine  from  the  mean  data  the  'best'  degree,  k*,  of 
the  polynomial  pk{Li)  modeling  the  'true'  response  curve.  Statistical  hypothesis  testing  at 
levels  of  significance  am  and  ak  is  used  to  determine  whether,  for  a  given  k,  the  term  OkL'-  is 
needed.  The  A;^a2.-degree  model  is  fit  first,  then  insignificant  higher  degree  terms  are  dropped 
one  by  one.  The  term  OkL'-  is  dropped  if  either  6k  or  the  model  error  under  the  {k  —  1)- 
degree  model  is  determined  to  be  not  significant.  The  constraint  max{/i*,  kmin}  ^  k*  <  kmax 
is  applied  where  kmin  =  1-  It  is  fully  expected  that  for  some  force  sensors  the  model  error 
cannot  be  eliminated. 

Algorithm  K. 

i.     oet   K  '=■  rZrnax' 

2.  \ik  <  max{/i*,  kmin}  then  step  7.  Else, 

3.  Set  Vem  =  n  —  k  —  1  and  compute  the  statistic 


Frr.     = 


^l 


which  has  the  F{vemi  ^e)  distribution  under  the  hypothesis  cr^  =  0.  Use  (16)  and  (18). 
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4.  If  GF(^^^^t,^){Fm)  <  1  -  a^  or  k  =  k^ax  set  k  =  k  -  I  then  step  2.  Else, 

5.  Set  fc  =  A;  +  1,  Vgrn  =  n  —  k  —  \^  and  compute  the  statistic 

which  has  the  F(l,  Vem)  distribution  under  the  hypothesis  6^  =  0. 

6.  If  GF(i^t,^^)[Fk)  <  1  —  Qjt  set  k  =  k  —  2  then  step  2.  Else, 

7.  Set  A;*  =  k, 

^em     —     n  —  k*  —  I,    and  (19) 

aL     =    rSSem{k*)/uem-  (20) 

8.  Done. 

Having  determined  h*  and  k*,  the  'design'  matrix  X  is  fixed  and  0  is  computed  by 
(16).  The  estimated  variance  of  9  is  obtained  by  substituting  the  proper  estimate  of  a^  into 
(17).  When  a^^  is  significantly  larger  than  cr^,  indicated  by  a  large  value  of  Fm  in  step  3 
of  algorithm  /i,  then  a^^  should  be  used  so  that  the  eifects  of  significant  model  error  are 
included  in  the  estimated  variance  of  9 ,  thus 

±^  =  {X'Xr'crlJr.  (21) 

(Note  that  model  errors  are  actually  biases,  and  the  quantity  al^  contains  the  mean  squared 
model  error  as  a  reasonable  estimate  of  bias  effects.) 

When  (7gj^  is  not  significantly  larger  than  a^  the  two  values  may  be  pooled  to  give 

where 

Vep  =  fe  -\-  t^em  =  r{n  -  h*  -  1)  +  h*  -  fc*,  (22) 

thus 

S^  =  {X-^Xy^a^/r.  (23) 

Recalling  from  (5)  that 


r  A  +  wt  {o<t<h*) 
'    \  (3t       {h* <t< k*) 


(24) 


and  that  E(w<)  =  0,  an  estimate  of  /9  is 

h  ^9  (25) 

with 

Vrr(^)  =  S^  =  S,  +  i(^"    °)  (26) 

where  S^  is  computed  from  either  (21)  or  (23)  as  appropriate,  S^  is  as  in  (14),  and  the  zero 
matrices  (0)  are  of  the  proper  size  to  make  the  rightmost  matrix  in  (26)  (A;*  +  1)  x  {k*  +  1)- 
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4.4      Summary  of  Error  Contributions 

To  recapitulate  from  section  2.3,  the  five  types  of  error  considered  are 

1 .  roundoff, 

2.  within-run, 

3.  model, 

4.  between-run,  and 

5.  systematic. 

A  review  of  past  calibrations  of  force  sensors  at  NBS  has  shown  that  any  of  these  sources  of 
error  can  be  dominant.  It  would  seem  to  be  useful  then,  for  a  given  calibration,  to  estimate 
the  relative  contribution  of  error  from  each  of  the  sources.  One  measure  of  error  from  a 
given  source  is  a  'cr-like'  value  to  represent  the  'average'  contribution  of  that  source  to  a 
single  observation.  Estimates  of  such  values  can  be  computed  from  the  data  for  the  first 
four  sources.  For  the  fifth  source  an  absolute  limit  {E)  is  used  for  combined  effects  of  all 
'systematic'  errors  as  described  in  section  2.3.  It  is  not  a  'cr-like'  value. 

In  table  1  the  sources  of  error  are  listed  along  with  their  'cr-hke'  estimators  (5).  The 
5-values  are  for  rough  comparison  of  sources  of  error  only.  They  are  not  intended  to  be 
'corrected'  values  of  standard  errors  previously  computed.  (In  the  past  some  methods, 
which  are  valid  under  certain  conditions,  have  been  given  for  'correcting'  error  variances 
for  roundoff  error.  See  Eisenhart  [6].)  Hopefully  the  5- values  will  aid  the  experimenter 
in  understanding  the  characteristics  of  the  force  sensor-machine  combination  and  perhaps 
indicate  areas  where  improvements  could  be  made. 

Table  1:  Estimators  of  known  sources  of  error 


Error  type 

Error  'estimator' 

Reference  eqs. 

roundoff 

Sr  =  ^^i:u^l 

(27) 

within-run 

5e  =  ^max{0,a2-52} 

(13,27,28) 

model 

S^  =  -ymaxjO,  5-2^  -  a^} 

(13,20,28,29) 

between-run 

S,  =  yj{h*  -f  l)al/n 

(15) 

systematic 

E  =    combined  limits 

At  this  point  the  method  of  determining  roundoff  error  must  be  addressed.  Often  data 
from  a  digital  readout  is  inherently  rounded.  In  recording  the  data  the  user  may  round 
even  further.  Traditionally  (see  Snedecor  and  Cochran  [24,  p.  81])  the  rounding  error,  /),  is 
treated  as  a  uniformly  distributed  random  variable  in  the  interval  [—A/2,  A/2]  where  A  is 
the  smallest  possible  difference  between  two  unequal  numbers.  It  follows  that  E(p)  =  0  and 


15 


Var(p)  =  A^/12.  For  the  sake  of  generality  suppose  that  the  rounding  interval  of  the  readout 
device  is  A,  at  load  L,-  and  that  pij  is  the  rounding  error  corresponding  to  the  observation 
yij.  Then  the  'mean  squared'  rounding  error  is 

In  models  (4)  and  (7)  the  {tij}  values  were  said  to  represent  the  'noise'  in  the  measurement 
system.  In  fact,  each  e,j  also  includes  the  roundoff  error  pij. 
The  quantities  in  (13)  and  (20)  can  be  split  into  components 

al    ^    5,2  +  5,2  (28) 

^L     -     52  +  52  +  5^  (29) 

to  yield  estimates  of  5^  and  5,  free  of  roundoff  error.  The  variance  for  between-run  effects, 
(jj,  in  (15)  reflects  the  'average'  variance  per  nonzero  Utj.  Multiplying  this  by  [h*  +  1)/?^ 
yields  an  'average'  variance  per  observation,  S^. 

4.5      Confidence  and  Prediction  Intervals 

The  final  step  in  the  calibration  process  is  to  compute  uncertainty  bounds  for  the  quantities 
of  interest.  In  the  present  case  those  bounds  are 

1.  a  confidence  interval  for  the  'true'  sensor  response  at  a  given  load, 

2.  a  confidence  band  for  the  'true'  sensor  response  curve  over  its  entire  range,  and 

3.  a  prediction  interval  for  a  new,  independent  observation  with  the  sensor  randomly 
oriented  in  the  same  machine. 

The  bounds  in  2  and  3  may  further  be  used  for  inverse  prediction  as  discussed  in  section 
4.6.  Seber  [23,  ch.  5]  serves  as  a  reference  for  the  computations  which  follow.  In  each  of  the 
above  cases  formulas  are  given  for  two-sided  100(1  —  S)%  bounds  where  5  is  a  probability  in 
the  exclusive  range  (0,1). 

Note  that  one  item  missing  from  the  above  list  is  the  vector  of  fixed  effects  y3.  In  most 
regression  applications  a  confidence  interval  for  (3  would  be  of  interest,  but  here  it  is  not. 
If  such  an  interval  were  of  interest  it  could  be  formed  using  the  quantities  defined  in  (25) 
and  (26),  but  the  question  of  the  appropriate  number  of  degrees  of  freedom  to  assign  to  the 
variance  in  (26)  would  be  difficult.  A  frighteningly  complicated  approximation  is  given  in 
[8]. 

Incorporating  (24)  the  'true'  sensor  response  to  the  axial  load  L  can  be  expressed 

16 


Taking  E{ut)  =  0  the  predicted  sensor  response  is 


k* 


t=o 

=   x'^e  (30) 

where  A  =  (Ai  As)^  =  (1  L  . . .  L'^*  |  1^'+'   . . .  L^'f,  and 

Var[pfc.(L)]     =     A^i^A +Aii„Ai/r 

-     t/i  +  f/2.  (31) 

A  confidence  interval  for  the  'true'  response  at  load  L,  pk*{L),  is  then 

Pk*iL)  ±  ti_5/2{ui)\/y8,T[pk>iL)] 

where  Vi  is  the  number  of  degrees  of  freedom  associated  with  the  estimated  variance  and 
ti_s/2{^i)  is  a  quantile  of  Student's  t  distribution  with  Ui  degrees  of  freedom  (see  appendix 
A).  Since  the  estimated  variance  in  (31)  is  a  linear  combination  of  two  variances,  an  'effective' 
number  of  degrees  of  freedom  is  computed  by  the  Welch-Satterthwaite  formula  (see  Gaylor 
and  Hopper  [7])  to  be 

_  (U^  +  U.r 

"'-mive  +  UiKr-D  ^''' 

where 

as  in  (19),  if  ^^  was  computed  by  (21) 


as  in  (22),  '\i  H  q  was  computed  by  (23) 


Note  that  \ih*  =  —1  then  17  ^  is  degenerate,  thus  U2  =  0  and  ui  =  uq.  This  type  of  confidence 
interval  is  appropriate,  for  example,  if  the  sensor  response  at  nominal  maximum  load  is  to 
be  used  as  a  check  standard.  This  type  of  interval  is  illustrated  by  the  dotted  lines  in  figure 
12  for  the  example  in  appendix  B. 

A  confidence  band  for  the  entire  'true '  response  curve  is  taken  to  be  the  set  (over  L)  of 
all  individual  confidence  intervals  for  Pk*{L)  of  the  form 

Pk*{L)±K^  (33) 

where 

Ki  =  yj{k*  +  l)F,_s{k*  +  l,i/i)Var[p,.(i:)]  (34) 

and  Fi_s{k*  -{-  !■,  I'l)  IS  a.  quantile  of  the  F  distribution  with  k*  -\-l  and  Vi  degrees  of  freedom 
as  described  in  appendix  A.  This  type  of  band  is  illustrated  by  the  dashed  line  in  figure  12. 
A  third  quantity  of  interest  is  a  confidence  interval  for  a  new  observation  y*  obtained 
when  the  sensor  is  re-oriented  in  the  same  measuring  machine  and  a  load  L*  applied.  The 
new  observation  can  be  written 

y*=Pk.{L*)  +  xfu  +  e* 
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where  A*  =  (A^  A^)  =  [l  L*  . . .  L*^'  \  L*('^*+i)  . . .  L*'=*)^,  u  is  a  random  {h*  +  l)-vector 
from  a  distribution  with  mean  zero  and  variance  E  „,  and  e*  is  an  independent  deviate  from 
a  distribution  with  mean  zero  and  variance  a^.  It  follows  that 

E(?/*)     =    pAL*)    and  (35) 

Var(y*)     =    Varb,.(L*)]  +  A*^17,At+a,2 

=    xfEuK  +  (^l-  (36) 

Replacing  the  expectation  and  variance  of  y*  by  their  estimates  and  incorporating  (30)  and 
(31)  gives 

y*    =  pk*{L*)   =  A*^^    and 

Var(y*)     =  Var[p,*(L*)]  +  A^  i,At  +  a^' 

=  A*^f^A*  +  (l  +  i)Aff  .a;  +  <7,2  (37) 

=  V1  +  V2  +  V3. 

As  before  the  'effective'  number  of  degrees  of  freedom  for  the  estimated  variance  in  (37)  is 

(V. + V. + v^r  .  , 

"'       Vfl„e  +  V}l{r-\)  +  V}lu,  ^^'•' 

where  vq  is  as  in  (32)  and  i/g  is  as  in  (12).  As  before,  if  h*  —  —1  then  l^u  is  degenerate  and 
(38)  is  computed  with  V2  =  0.  A  confidence  interval  for  a  new  observation  y*  as  defined 
above  is  then 

Pk>{r)±K2  (39) 

where 

K2  =  h-si2{v;)\[^Mf)-  (40) 

This  type  of  interval  is  illustrated  by  the  soHd  line  in  figure  12. 

4.6      Inverse  Prediction 

The  calibrated  force  sensor  is  ordinarily  used  to  measure  an  unknown  load  L*  from  a  single 
new  observation  y*  on  some  machine,  not  necessarily  the  one  on  which  it  was  cahbrated. 
The  desired  result  is  an  estimate  of  the  magnitude  of  the  unknown  load  and  a  confidence 
interval  for  it.  This  application,  known  as  inverse  prediction,  consists  of 

•  computing  the  calibration  curve  and  the  appropriate  confidence  band(s)  for  the  sensor, 

•  obtaining  one  or  more  new  observations  at  unknown  load(s),  and 

•  estimating  each  unknown  load  and  its  confidence  interval. 
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Three  cases  of  inverse  prediction  with  a  calibrated  force  sensor  will  be  considered  here: 

•  a  single  use  case  on  the  same  machine, 

•  a  multiple  use  case  on  the  same  machine,  and 

•  a  multiple  use  case  on  a  different  machine. 

The  term  multiple  use  simply  means  that  the  same  calibration  curve  will  be  used  for  inverse 
prediction  more  than  once.  This  situation  is  beheved  most  likely  to  occur  in  practice,  thus 
the  multiple  use  methods  (described  later  in  this  section)  are  recommended. 

The  single  use  case  will  be  given  only  for  use  on  the  same  machine.  A  confidence  interval 
for  a  new  observation  y*  can  be  obtained  for  any  L  by  (39).  So,  given  y*,  a  100(1  —  S)% 
confidence  interval  for  the  unknown  load  L*  is  the  set  of  L's  for  which  the  confidence  interval 
calculated  by  (39)  encloses  y*,  i.e.,  for  which 

Pk'iL)-K2<y*<Pk'{L)-\-K2 

where  K2  is  a^  in  (40).  This  set  will  be  connected  when  the  upper  and  lower  bounds  of  the 
confidence  band  are  monotonic.  The  estimate  of  L*  is  L*  which  satistifes 

Pk*{L*)  =  y*. 

If  k*  >  1  then  L*  may  have  to  be  computed  iteratively.  Figure  9  illustrates  the  graphical 
procedure  for  determining  L*  and  the  confidence  interval  {L*_,L'^)  for  L*.  Again,  this  type 
of  confidence  interval  for  L*  is  valid  only  when  a  single  inverse  prediction  is  made  after  a 
given  calibration. 

In  the  m,ultiple  use  case  [same  machine)  a  well  known  but  complicated  method  of  inverse 
prediction  is  that  of  SchefFe  [20].  In  a  forthcoming  paper  Carroll,  Sacks,  and  Spiegelman  [3], 
hereafter  designated  CSS,  present  a  modification  of  the  SchefFe  method  which  is  far  simpler 
to  implement  and  generally  leads  to  shorter  intervals.  The  first  part  of  the  CSS  method  is 
to  compute  the  100(1  —  6)%  confidence  band  for  the  entire  response  curve  as  in  (33)  and 
(34).  This  band  is  illustrated  in  figure  11.  The  second  part  is  to  compute  a  100(1  —  a)% 
confidence  interval  for  the  'true'  response  pk*{L*)  based  on  the  new  observation  y*.  The 
estimated  variance  of  y*  is  computed  from  (36)  to  be 

Var(y*)     =     xfs^Xl-\-al 

=    W^-^W2  (41) 

which  is  'effectively'  based  on 

_     {w,  +  w^r  ,,,. 
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degrees  of  freedom.  If  17  „  is  degenerate  then  Wi  =  0  and  u^  =  u^.  Notice  that  the  variance  of 
Pk*{L*)  does  not  enter  into  this  part  of  the  computation.  An  appropriate  confidence  interval 
for  pk*{L*),  incorporating  (35)  and  (41),  is 

y"  ±  Ks  (43) 

where  

Ks  =  ii-a/2(^3*)\/Var(r)  (44) 

Note  that  the  variance  term  Wi  in  (41)  is  a  function  of  L*  (through  A*)  which  is  unknown. 
In  this  case  L*  may  be  approximated  by  Pk}{y*)  in  the  vector  A^. 

A  confidence  interval  for  the  unknown  load  L*,  using  the  CSS  procedure,  is  then  the  set 
of  all  X's  for  which 

Pk*{L)  -  lU  -  Ks  <y*<  PAL)  +  K,  +  K3  (45) 

where  Ki  and  K3  are  as  in  (34)  and  (44)  respectively.  This  procedure  assures  that,  for  a 
given  cahbration  process,  the  probability  is  1  —  ^  that  the  expected  proportion  of  calculated 
confidence  intervals  for  L*  that  actually  cover  L*  will  be  at  least  1  —  a  in  the  long  run.  The 
probabihty  1  —  ^  refers  to  the  probability  that  the  calibration  of  the  sensor  is  'good',  that 
is,  the  confidence  band  covers  the  'true'  underlying  response  curve.  Figure  10  illustrates  the 
graphical  procedure  for  determining  L*  and  the  confidence  interval  (Z,1,L^)  for  L*  in  the 
case  under  consideration. 

In  the  multiple  use  case  {different  machine  from  the  one  on  which  the  sensor  was  cali- 
brated) the  previous  analysis  may  be  applied  using  the  variance  estimates  S  u  and  a"^  appro- 
priate to  that  machine.  If  these  quantities  are  available  then  they,  along  with  their  degrees 
of  freedom  which  correspond  to  r  —  1  and  i/g,  may  be  substituted  into  (41)  and  (42)  to  give 
Ks  as  in  (44).  The  prediction  of  L*  and  its  confidence  band  then  follow  as  before.  In  the 
likely  event  that  both  of  these  quantities  are  not  available,  an  approximate  100(1  —  a)% 
confidence  interval  for  pk*{L*)  of  the  form 

y*  ±  K, 

corresponding  to  (43)  may  be  used  where  K^  is  based  on  statistical  evidence  or  'engineering 
judgement'.  This  interval  is  used  in  conjunction  with  (33)  to  provide  an  estimate  L*  and  a 
confidence  interval  {L*_^L*^)  for  L*  as  illustrated  in  figure  11.  The  probability  statement  for 
the  coverage  is  the  same  as  in  the  previous  case  (fig.   10). 

In  CSS  two  modifications  are  discussed  which  will  shorten  the  inverse  prediction  intervals 
slightly,  but  for  the  sake  of  simphcity  they  will  not  be  used  here. 

5     Example 

In  the  process  of  developing  the  model  described  in  this  paper,  many  sets  of  force  calibration 
data  were  analyzed.  The  complete  analysis  of  one  of  these  data  sets  will  now  be  illustrated. 
The  example  chosen  is  typical  in  that 
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•  three  runs  were  made, 

•  the  between-run  error  appears  to  be  well  modeled  by  linear  pieces,  and 


the 


residuals  from  a  quadratic  fit  appear  to  have  a  serpentine  form. 


The  example  was  analyzed  by  the  new  method  with  hmax  =  3,  kmax  —  ^i  och  =  0.01, 
am  =  0.10,  and  a^  =  0.025.  At  each  stage  of  the  fitting  procedure  to  determine  h*  the 
probabiHty  of  overfitting  was  ah  =  0.01,  and  the  corresponding  probability  of  overfitting 
when  determining  k*  was  am<yk  —  0.0025.  The  analysis  was  performed  by  the  FORTRAN 
program  FCAL88  which  incorporates  the  new  method  described  in  this  paper.  The  output 
from  the  program  is  given  in  appendix  B.  The  net  observations  {yij}  appear  on  the  first  page 
of  that  output.  The  degrees  of  the  'best  fitting'  polynomials  were  determined  to  he  h*  =  1 
and  k*  =  3.  Figures  1  through  12  are  based  on  this  analysis  of  the  example.  In  figures  9 
through  11  the  confidence  intervals  have  been  magnified  by  a  factor  of  2000  vertically  for 
the  purpose  of  illustration. 

The  same  observations  were  re-run  with  h^ax  =  k^ax  =  2  in  order  to  limit  the  fits  to 
nothing  higher  than  a  quadratic.  In  that  case  h*  =  1  and  k*  —  2  were  computed  (the 
computer  output  is  not  shown).  Some  results  from  the  two  analyses  are  compared  in  table 
2  below. 

Table  2:  Comparison  of  two  fits  to  the  example  data 


Quantity 

k*=2 

k*=3 

Reference 

h* 

1 

1 

(Alg.  H) 

<3-e 

0.120-  10-4 

0.120-  10-4 

Eq.  (13) 

<^em 

0.754-10-4 

0.204-10-4 

Eq.  (20) 

00 

1.03347 

1.03347 

Eq.  (16) 

0, 

0.96743 

0.96767 

Eq.  (16) 

02 

0.32926  -  10-3 

0.32575  -  10-3 

Eq.  (16) 

03 

-0.33087  -  10-3 

Eq.  (16) 

Sr 

0.2887-10-^ 

0.2887-10-^ 

(Table  1) 

Se 

0.1167-10-4 

0.1167-10-4 

(Table  1) 

Sm 

0.1301  -  10-3 

0.3329-10-4 

(Table  1) 

Sk 

0.2263  •  10-4 

0.2263  -  10-4 

(Table  1) 

E 

0.2013-  10-4 

0.2013-  10-4 

(Table  1) 

Note  that  FCAL88  produces  a  two- page  table  with  four  kinds  of  95%  confidence/prediction 
intervals.    This  allows  inverse  prediction  to  be  done  by  interpolation  on  the  appropriate 
columns.    For  example,  suppose  an  unknown  load  L*  was  applied  (on  the  same  machine) 
and  that  an  independent  (net)  observation  y*  =  1.47901  was  obtained.  The  estimate  of  L* 
linearly  interpolated  from  the  Predicted  response  and  Load  columns  is  L*  =  221,757.2 
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(see  table  3  which  reproduces  part  of  the  two-page  table  from  appendix  B).  Using  columns 
Table  3:  Partial  listing  of  95%  confidence  interval?  for  the  example  data 


Load 

Predicted 
response 

Single  use 

Multiple  use 

[3]              [4 
(lower)      (upper) 

[7]             [8] 
(lower)      (upper) 

219,000 

1.46061 

1.46051      1.46071 

1.46044     1.46078 

222,000 

1.48063 

1.48053     1.48074 

1.48046     1.48081 

[3]  and  [4]  the  single  use  95%  confidence  interval  for  L*  is  (221,740.9  ,  221,772.2),  and  using 
columns  [7]  and  [8]  the  corresponding  multiple  use  interval  is  (221,730.4  ,  221,782.7).  If 
uncertainty  due  to  systematic  errors  is  to  be  accounted  for,  each  of  these  confidence  intervals 
would  be  increased  by  (0.002%)(221,  757.2)  «  4.4  in  each  direction.  (For  this  example  the 
load  unit  is  pounds-force.) 

For  numerical  stability  FCAL88  linearly  transforms  the  load  values  Li  to  the  interval 
[—1,1]  before  any  computations  begin  (a  good  practice  in  polynomial  regression).  This 
transformation  is  invisible  to  the  user  except  for  the  computed  values  of  S^  ('ESTIMATED 
COVARIANCE  MATRIX  FOR  BETWEEN-RUN  EFFECTS',  p.  B-3),  0  ('ESTIMATES  OF  POLYNOMIAL 
RESPONSE  CURVE  COEFFICIENTS  FROM  MEAN  DATA',  p.  B-7),  and  S^-  ('ESTIMATED  COVARI- 
ANCE MATRIX  FOR  POLYNOMIAL  COEFFICIENTS',  p.  B-7).  These  three  quantities  are  highly 
dependent  on  the  extreme  values  of  the  load  grid, 

Lmin  =  mmjLi,  L2, . . . ,  Ln}    and   Lmax  =  max{Li,  L2, . . . ,  Ln]  , 

which  transform  to  —1  and  1  respectively.  Consequently,  values  of  Sl^,  ^,  and  S^  obtained 
from  different  calibrations  are  comparable  only  if  Lmin  and  Lmax  are  the  same  in  each  case. 
The  linear  transformation  to  [—1, 1]  is 


2(L  —  Lmtn)  -, 

X  =■ 1  , 

T  —  T 


and  the  inverse  transformation  is 


J-   V"^    I    ^  )\^max        ^min)     |^    j- 


(46) 


(47) 


The  original  and  transformed  load  values  for  the  example  data  are  shown  in  table  4.  The 
virtue  in  transforming  is  seen  when  examining,  for  example,  S^  as  defined  by  (21)  or  (23). 
This  quantity  contains  the  factor 

U  =  (X^X)-^  (48) 

which  is  called  the  unsealed  variance  of  0.  Note  that  U  depends  only  on  the  pre-selected 
load  grid,  by  (3),  and  not  on  the  observed  data,  therefore  it  is  an  appropriate  means  for 
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Table  4:  Original  and  transformed  load  values  for  the  example  data 


Original 

Transformed 

i 

L^  (Ibf) 

Xi 

1 

10,000 

-1.000000 

2 

30,000 

-0.862069 

3 

60,000 

-0.655172 

4 

90,000 

-0.448276 

5 

120,000 

-0.241379 

6 

150,000 

-0.034483 

7 

180,000 

+0.172414 

8 

210,000 

+0.379310 

9 

240,000 

+0.586207 

10 

270,000 

+0.793103 

11 

300,000 

+1.000000 

comparison.  Using  the  Li  values  to  form  X,  equation  (48)  yields 


Ui 


0.116-10+°'  -0.295-10-°* 

-0.295  -  10-°*  0.104  •  10-°^ 

0.194-10-°^  -0.772-10-'* 

-0.368  -  10-'^  0.156  •  10-1^ 


0.194- 10-°^  -0.368-10-'^ 

-0.772-10-'*  0.156    10-'^ 

0.610-10-'^  -0.128-10-24 

-0.128-10-2*  0.277-10-3° 


(49) 


Similarly,  using  the  corresponding  Xi  values  to  form  X, 


U. 


0.214-10+°°  -0.149 -10-°2 

-0.149 -10-°2  0.157-10+°' 

-0.296  -  10+°°  0.160  ■  10-°2 

-0.164  ■10-°2  -0.186-10+°' 


-0.296-  10-°° 
0.160-  10-°2 
0.712-10+°° 
0.272-  10-°' 


-0.164 -10-°2 

-0.186-  10+°' 

0.272  -  10-°' 

0.257-10+°' 


(50) 


Note  that  the  lower  right  entry  in  the  right  hand  side  of  (49)  approaches  the  underflow  limit 
on  some  computers,  a  potentially  unpleasant  situation.  A  much  better  situation  occurs  when 
computing  with  the  transformed  loads,  as  illustrated  in  (50).  Transitions  to  and  from  the 
original  loads  are  easily  made  with  (46)  and  (47). 

6      Simulation  Results 

A  simulation  study  was  performed  using  the  estimated  quantities  computed  for  the  example 
(by  the  new  method)  as  a  basis.  Specifically,  10,000  sets  of  {yij}  values  were  generated  using 
(2)  where  (/?o  $i  $2  $3)  were  substituted  for  (/?o  /3i  ^2  f^s),  the  {cij}  were  generated  from 
a  normal  distribution  with  mean  zero  and  variance  a^,  and  the  {(woj  ^ij)}  were  generated 
from  a  multivariate  normal  distribution  with  mean  zero  and  variance  I!  y,-  No  allowance 
was  made  for  model  error.  Each  set  of  simulated  observations  was  run  under  the  FCAL88 
program.    The  number  of  occurrences  of  each  computed  (/i*,fc*)  pair  are  given  in  table  5 
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Table  5:  Computed  'best'  polynomial  degrees  for  10,000  simulated  data  sets 


h* 

k* 

1 

2 

3 

4 

5 

-1 

0 

0 

1683 

2 

0 

0 

0 

0 

1118 

1 

0 

1 

0 

0 

6949 

23 

17 

2 

0 

0 

104 

1 

0 

3 

0 

0 

99 

0 

3 

above.  The  levels  of  significance  used  were  ah  =  0.01,  a^  =  0.10,  and  ak  =  0.025  as  before. 
Note  that  the  entries  in  italics  represent  cases  of  overfitting  which  are  controlled  by  the  three 
levels  of  significance.  For  example,  the  number  of  entries  in  the  /i*  =  3  row  is  102  which 
approximates  the  expected  number  10,0000/1  =  100,  and  the  number  of  entries  in  the  /:*  =  5 
column  is  20  which  approximates  the  expected  number  10,000a^Q'fc  =  25.  Similar  numbers 
of  entries  would  be  expected  for  the  cases  h*  =  2  and  k*  =  i  respectively. 

Underfitting  is  little  cause  for  concern  in  this  simulation.  The  fact  that  no  cases  oi  k*  =  I 
or  2  were  observed  indicates  that  the  coefficient  ^^3  is  highly  significant  in  the  example.  On 
the  other  hand,  the  coefficients  of  the  linear  terms  in  the  between-run  eifects  are  much  less 
significant  since  they  were  determined  to  be  not  significant  about  28%  of  the  time. 

The  greatest  value  in  this  simulation  exercise  may  be  to  forewarn  the  user  that  repeated 
calibrations  of  the  same  force  sensor  may  result  in  several  different  (A*,  k*)  pairs,  even  though 
the  underlying  'true'  sensor  response  curve  does  not  change. 

7      Optimality  Considerations 

In  a  typical  NBS  calibration  of  a  force  sensor  the  applied  loads  are  uniformly  spaced  be- 
tween 10%  and  100%  of  sensor  capacity  (see  section  2.2).  The  question  naturally  arises  as 
to  whether  a  different  grid  of  appHed  loads,  with  the  same  number  of  points,  would  produce 
'better'  results  in  some  sense.  Seber  [23,  pp.  231-234]  discusses  several  types  of  optimahty 
criteria  for  polynomial  regression  of  a  fixed  degree.  Of  those  criteria  the  one  of  most  im- 
portance in  the  present  application  would  seem  to  be  the  minimization  of  the  variance  of 
9k^  the  fitted  coefficient  of  the  highest  power  of  the  load.  In  algorithm  K ,  6k  is  tested  for 
significance  for  one  or  more  values  of  k. 

For  a  given  load  grid  and  polynomial  degree  k  the  'design'  matrix  X  is  computed  as  in 
(3).  This  matrix  is  then  used  in  computing  the  vector  of  fitted  coefficients  0  as  in  (16)  and 
its  variance  as  in  (21)  or  (23).  As  discussed  in  section  5,  the  factor  (X-^X)"^  is  the  unsealed 
variance  of  0,  thus  the  unsealed  variance  of  0^  is  the  {k  -\-  1  ^  k  -\-  I)  entry  of  (X-^X)~^  (lower 
right  corner).  This  quantity  is  computed  for  polynomial  fits  of  degree  2,  3,  4,  and  5  for 
each  of  seven  different  load  grids  as  shown  in  table  6.   (In  this  paper  it  is  assumed  that  no 
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Table  6:  Comparison  of  unsealed  variances  of  the  fitted  coefficients  6^  for  several  load  grids 
with  n  =  10 


Unsealed  variance  of  6^ 

Load  grid  (%  of  sensor  capacity) 

k  =  2 

k  =  3 

k  =  A 

k^b 

1     10.0    10.0    10.0   55.0    55.0 
55.0    55.0    100.0    100.0    100.0 

0.417 

_ 

„ 

_ 

2     10.0    10.0    32.5   32.5    32.5 
77.5    77.5    77.5  100.0  100.0 

0.741 

1.63 

_ 

3     10.0    10.0    23.2   23.2    55.0 
55.0    86.8    86.8    100.0    100.0 

0.714 

3.00 

7.00 



^     10.0    18.6    18.6   41.1    41.1 
68.9    68.9    91.4   91.4    100.0 

0.800 

3.20 

12.79 

25.6 

5     10.0    12.7    20.5   32.5    47.2 
62.8    77.5   89.5   97.3    100.0 

0.740 

3.01 

12.19 

49.3 

^     10.0    15.0    20.0    30.0    45.0 
65.0    80.0    90.0   95.0    100.0 

0.874 

3.11 

11.63 

40.4 

^     10.0    20.0    30.0   40.0    50.0 
60.0    70.0    80.0   90.0    100.0 

0.777 

2.69 

10.21 

43.7 

^      Optimal  load  grid  in  10%-100%  range  for  2""^  degree  fit 

2  Optimal  load  grid  in  10%-100%  range  for  3'"'^  degree  fit 

3  Optimal  load  grid  in  10%-100%  range  for  4*''  degree  fit 
^     Optimal  load  grid  in  10%-100%  range  for  5*''  degree  fit 
^     Optimal  load  grid  in  10%-100%  range  for  9^''  degree  fit 
^     Load  grid  ^  rounded  to  nearest  5% 

^     Uniform  load  grid 

polynomial  higher  than  a  5*''  degree  will  be  fit.)  In  order  for  the  comparisons  to  be  fair,  each 
grid  contains  the  same  number  of  points.  The  first  five  of  these  grids  are  optimal  for  the 
indicated  polynomial  degrees.  They  are  the  so-called  Chebyshev  points, 


Xi  —  —  cosf  —  )      [i 


0,1,..., A;), 


linearly  transformed  from  the  interval  [—1,1]  to  the  interval  [10%,  100%]  times  sensor  ca- 
pacity. Note  that  in  the  case  n>  k  -\-\  optimality  is  achieved  by  replicating  measurements 
at  certain  loads.  In  all  cases  the  unsealed  variance  of  d]^  was  computed  from  the  equivalent 
loads  in  [—1, 1]. 

The  first  four  grids  provide  the  optimal  values  shown  in  boldface.  The  first  three  are  of  no 
use  since  there  are  not  enough  distinct  loads  to  fit  all  polynomials  of  interest.  Examination 
of  the  last  four  shows  that  none  is  consistently  better  than  the  others  over  the  degrees 


26 


considered.  By  a  slim  margin  the  uniform  grid  appears  to  be  best.  Based  on  this  brief 
analysis  there  seems  to  be  no  evidence  that  any  other  load  grid  should  be  preferred  over  the 
uniform. 

8      Conclusion 

Force  sensors  are  not  characterized  by  a  single  number  but  by  a  response  curve  over  a 
wide  range  of  loads.  The  precise  form  of  this  curve  depends  on  the  details  of  design  and 
construction  of  the  particular  sensor.  After  trying  various  mathematical  models  for  the 
response  curve,  it  was  concluded  that  variable  degree  polynomials  generally  provided  the 
best  fits. 

A  new  statistical  model  for  caHbrating  force  sensors,  based  on  automatic  procedures  for 
determining  the  degrees  of  polynomials,  has  been  presented.  It  attempts  to  model  several 
known  sources  of  error  and  quantize  their  relative  contributions.  Methods  for  computing 
several  types  of  confidence  and  prediction  intervals  have  been  given.  The  three  levels  of 
significance  used  in  determining  the  polynomial  degrees  may  be  adjusted  to  suit  the  users 
needs  as  may  be  the  levels  for  the  confidence/prediction  intervals.  Usage  of  the  calibration 
curves  for  inverse  prediction  has  also  been  discussed  and  illustrated.  A  brief  optimality  study 
has  shown  that  the  traditional  uniform  spacing  of  applied  loads  cannot  be  improved  on  when 
allowing  polynomial  fits  of  S*'*  degree  or  less. 

This  new  method  of  analysis  is  not  hkely  to  solve  all  the  problems  of  force  sensor  cal- 
ibration. The  author  believes,  however,  that  this  method  will  prove  to  be  a  useful  tool  in 
developing  a  better  understanding  and  a  better  characterization  of  their  behavior. 
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A     Definitions  of  the  F  and  t  distributions 

The  probability  density  function  (p.d.f.)  of  the  F{m,n)  distribution  may  be  expressed  as 


—         m  — 2 
2 


gF{m,n){^) 


r(T)r(f)(i  +  ^) 


2 


where  x  >  0,  m  >  0,  n  >  0,  and  T{a)  =  J^  x°^~^e~^dx.  This  function  is  plotted  in  figure  13 
for  the  case  m  =  5  and  n  =  18. 

The  cumulative  distnbution  function  (c.d.f.)  of  the  F{m,n)  distribution  may  be  ex- 
pressed as 

GF(m,n){^)  -    /      gF(m,n){u)du 

Jo 
where  a:   >   0.     This  function  is  plotted  in  figure  14  for  m  and  n  as  above.     Note  that 

G'F(m,n)(0)  =  0  and  GF(m,n){oo)  =  1- 

The  p-quantile  of  the  F{m,n)  distribution,  denoted  by  Fp{m,n),  is  the  x  value  which 
satisfies  GF(^rn,n){x)  =  p.  ' 

When  an  F  test  is  performed  either  the  c.d.f  or  a  quantile  must  be  computed.  Since 
the  c.d.f.  is  easier  to  compute  on  a  machine,  that  quantity  has  been  incorporated  into  the 
F  tests  of  algorithms  H  (section  4.2)  and  K  (section  4.3). 

In  similar  fashion  the  p.d.f  of  the  t{n)  distribution  is 


2 


where  — oo  <  a:  <  oo  and  n  >  0.  This  curve  is  plotted  in  figure  15  for  the  case  n=10.  The 
c.d.f  is 

Gt{n){x)  =  /      gt{n){u)du 

•^  — CO 

where  — oo  <  x  <  oo.  Note  that  Gt(n)i  —  oo)  —  0,  Gt{n){0)  =  0.5,  and  G((„)(oo)  =  1.  This 
function  is  plotted  in  figure  16  for  n  as  above.  The  p-quantile,  denoted  by  tp{n),  is  the  x 
value  which  satisfies  Gt(„)(ar)  =  p. 

In  this  paper  the  t  distribution  is  used  only  in  computing  confidence  intervals  (sections 
4.5  and  4.6),  thus  only  quantiles  are  required. 
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B  Output  of  FCAL88  Computer  Analysis  of  Example  Data 
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of  interest  to  the  consumer.  Easily  understandable  language  and  illustrations  provide  useful  background 
knowledge  for  shopping  in  today's  technological  marketplace. 

Order  the  above  NBS  publications  from:  Superintendent  of  Documents,  Government  Printing  Office, 
Washington,  DC  20402. 

Order  the  following  NBS  publications — FIPS  and  NBSIR  's—from  the  National  Technical  Information  Ser- 
vice, Springfield,  VA  22161. 

Federal  Information  Processing  Standards  Publications  (FTPS  PUB) — Publications  in  this  series  collectively 
constitute  the  Federal  Information  Processing  Standards  Register.  The  Register  serves  as  the  official  source  of 
information  in  the  Federal  Government  regarding  standards  issued  by  NBS  pursuant  to  the  Federal  Property 
and  Administrative  Services  Act  of  1949  as  amended.  Public  Law  89-306  (79  Stat.  1127),  and  as  implemented 
by  Executive  Order  11717  (38  FR  12315,  dated  May  11,  1973)  and  Part  6  of  Title  15  CFR  (Code  of  Federal 
Regulations). 

NBS  -Interagency  Reports  (NBSIR) — A  special  series  of  interim  or  final  reports  on  work  performed  by  NBS 
for  outside  sponsors  (both  government  and  non-government).  In  general,  initial  distribution  is  handled  by  the 
sponsor;  public  distribution  is  by  the  National  Technical  Information  Service,  Springfield,  VA  22161,  in  paper 
copy  or  microfiche  form. 
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U.S.  Department  of  Commerce 

National  Bureau  of  Standards 
Gaithersburg,  MD  20899 

Official  Business 
Penalty  for  Private  Use  $300 


